Efficiently Producing the K Nearest Neighbors in the Skyline on Vertically Partitioned Tables

نویسندگان

  • Marlene Goncalves
  • Maria-Esther Vidal
چکیده

Criteria that induce a Skyline naturally represent user’s preference conditions useful to discard irrelevant data in large datasets. However, in the presence of high-dimensional Skyline spaces, the size of the Skyline can still be very large, making unfeasible for users to process this set of points. To identify the best points among the Skyline, the Top-k Skyline approach has been proposed. Top-k Skyline uses discriminatory criteria to induce a total order of the points that comprise the Skyline, and recognizes the best or top-k points based on these criteria. In this article the authors model queries as multi-dimensional points that represent bounds of VPT (Vertically Partitioned Table) property values, and datasets as sets of multi-dimensional points; the problem is to locate the k best tuples in the dataset whose distance to the query is minimized. A tuple is among the k best tuples whenever there is not another tuple that is better in all dimensions, and that is closer to the query point, i.e., the k best tuples correspond to the k nearest points to the query that are incomparable or belong to the skyline. The authors name these tuples the k nearest neighbors in the skyline. The authors propose a hybrid approach that combines Skyline and Top-k solutions and develop two algorithms: TKSI and k-NNSkyline. The proposed algorithms identify among the skyline tuples, the k ones with the lowest values of the distance metric, i.e., the k nearest neighbors to the multi-dimensional query that are incomparable. Empirically, we study the performance and quality of TKSI and k-NNSkyline. The authors’ experimental results show the TKSI is able to speed up the computation of the Top-k Skyline in at least 50% percent with respect to the state-of-the-art solutions, whenever k is smaller than the size of the Skyline. Additionally, the authors’ results suggest that k-NNSkyline outperforms existing solutions by up to three orders of magnitude. Efficiently Producing the K Nearest Neighbors in the Skyline on Vertically Partitioned Tables

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Dominant and K Nearest Probabilistic Skylines

By definition, objects that are skyline points cannot be compared with each other. Yet, thanks to the probabilistic skyline model, skyline points with repeated observations can now be compared. In this model, each object will be assigned a value to denote for its probability of being a skyline point. When we are using this model, some questions will naturally be asked: (1) Which of the objects ...

متن کامل

A Novel Hybrid Approach for Email Spam Detection based on Scatter Search Algorithm and K-Nearest Neighbors

Because cyberspace and Internet predominate in the life of users, in addition to business opportunities and time reductions, threats like information theft, penetration into systems, etc. are included in the field of hardware and software. Security is the top priority to prevent a cyber-attack that users should initially be detecting the type of attacks because virtual environments are not moni...

متن کامل

Efficiently Evaluating Skyline Queries on RDF Databases

Skyline queries are a class of preference queries that compute the pareto-optimal tuples from a set of tuples and are valuable for multi-criteria decision making scenarios. While this problem has received significant attention in the context of single relational table, skyline queries over joins of multiple tables that are typical of storage models for RDF data has received much less attention....

متن کامل

A comparative study of performance of K-nearest neighbors and support vector machines for classification of groundwater

The aim of this work is to examine the feasibilities of the support vector machines (SVMs) and K-nearest neighbor (K-NN) classifier methods for the classification of an aquifer in the Khuzestan Province, Iran. For this purpose, 17 groundwater quality variables including EC, TDS, turbidity, pH, total hardness, Ca, Mg, total alkalinity, sulfate, nitrate, nitrite, fluoride, phosphate, Fe, Mn, Cu, ...

متن کامل

A New Hybrid Approach of K-Nearest Neighbors Algorithm with Particle Swarm Optimization for E-Mail Spam Detection

Emails are one of the fastest economic communications. Increasing email users has caused the increase of spam in recent years. As we know, spam not only damages user’s profits, time-consuming and bandwidth, but also has become as a risk to efficiency, reliability, and security of a network. Spam developers are always trying to find ways to escape the existing filters therefore new filters to de...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • IJIRR

دوره 3  شماره 

صفحات  -

تاریخ انتشار 2013